Probability-based validation of protein identifications using a modified SEQUEST algorithm.
نویسندگان
چکیده
Database-searching algorithms compatible with shotgun proteomics match a peptide tandem mass spectrum to a predicted mass spectrum for an amino acid sequence within a database. SEQUEST is one of the most common software algorithms used for the analysis of peptide tandem mass spectra by using a cross-correlation (XCorr) scoring routine to match tandem mass spectra to model spectra derived from peptide sequences. To assess a match, SEQUEST uses the difference between the first- and second-ranked sequences (ACn). This value is dependent on the database size, search parameters, and sequence homologies. In this report, we demonstrate the use of a scoring routine (SEQUEST-NORM) that normalizes XCorr values to be independent of peptide size and the database used to perform the search. This new scoring routine is used to objectively calculate the percent confidence of protein identifications and posttranslational modifications based solely on the XCorr value.
منابع مشابه
Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling.
Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible meth...
متن کاملInformatics For Protein Identification by Tandem Mass Spectrometry; Focused on Two Most-widely Applied Algorithms, Mascot and SEQUEST
Mass spectrometry(MS) is widely applied for high throughput proteomics analysis. When large-scale proteome analysis experiments are performed, it generates massive amount of data. To search these proteomics data against protein databases, fully automated database search algorithms, such as Mascot and SEQUEST are routinely employed. At present, it is critical to reduce false positives and false ...
متن کاملA dataset of human liver proteins identified by protein profiling via isotope-coded affinity tag (ICAT) and tandem mass spectrometry.
Proteins from human liver carcinoma Huh7 cells, representing transformed liver cells, and cultured primary human fetal hepatocytes (HFH) and human HH4 hepatocytes, representing nontransformed liver cells, were extracted and processed for proteome analysis. Proteins from stimulated cells (interferon-alpha treatment for the Huh7 and HFH cells and induction of hepatitis C virus [HCV] proteins for ...
متن کاملEmpirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.
We present a statistical model to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST. Employing the expectation maximization algorithm, the analysis learns to distinguish correct from incorrect database search results, computing probabilities that peptide assignments to spectra are correct based upon database search s...
متن کاملDTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics.
The components of complex peptide mixtures can be separated by liquid chromatography, fragmented by tandem mass spectrometry, and identified by the SEQUEST algorithm. Inferring a mixture's source proteins requires that the identified peptides be reassociated. This process becomes more challenging as the number of peptides increases. DTASelect, a new software package, assembles SEQUEST identific...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Analytical chemistry
دوره 74 21 شماره
صفحات -
تاریخ انتشار 2002